Parallel computation of phylogenetic consensus trees

نویسندگان

  • Andre J. Aberer
  • Nicholas D. Pattengale
  • Alexandros Stamatakis
چکیده

The field of bioinformatics is witnessing a rapid and overwhelming accumulation of molecular sequence data, predominantly driven by novel wet-lab sequencing techniques. This trend poses scalability challenges for tool developers. In the field of phylogenetic inference (reconstruction of evolutionary trees from molecular sequence data), scalability is becoming an increasingly important issue for operations other than the tree reconstruction itself. In this paper we focus on a post-analysis task in reconstructing very large trees, specifically the step of building (extended) majority rules consensus trees from a collection of equally plausible trees or a collection of bootstrap replicate trees. To this end, we present sequential optimizations that establish our implementation as the current fastest exact implementation in phylogenetics, and our novel parallelized routines are the first of their kind. Our sequential optimizations achieve a performance improvement of factor 50 compared to the previous version of our code and we achieve a maximum speedup of 5.5 on a 8-core Nehalem node for building consensi on trees comprising up to 55,000 organisms. The methods developed here are integrated into the widely used open-source tool RAxML for phylogenetic tree

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A fully resolved consensus between fully resolved phylogenetic trees.

Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most b...

متن کامل

The transposition distance for phylogenetic trees

The search for similarity and dissimilarity measures on phylogenetic trees has been motivated by the computation of consensus trees, the search by similarity in phylogenetic databases, and the assessment of clustering results in bioinformatics. The transposition distance for fully resolved phylogenetic trees is a recent addition to the extensive collection of available metrics for comparing phy...

متن کامل

Reconstruction of Maximum Likelihood Phylogenetic Trees in Parallel Environment Using Logic Programming

With rapid increase of nucleotide and amino acid sequence data, it is required to develop reliable and exible application programs to infer molecular phylogenetic trees. The maximum likelihood method is known to be robust among many methods for reconstruction of molecular phylogenetic trees, however, this method requires extremely high computational cost. Although parallel computation is a good...

متن کامل

Point estimates in phylogenetic reconstructions

MOTIVATION The construction of statistics for summarizing posterior samples returned by a Bayesian phylogenetic study has so far been hindered by the poor geometric insights available into the space of phylogenetic trees, and ad hoc methods such as the derivation of a consensus tree makeup for the ill-definition of the usual concepts of posterior mean, while bootstrap methods mitigate the absen...

متن کامل

Distributed and parallel algorithms and systems for inference of huge phylogenetic trees based on the maximum likelihood method

The computation of large phylogenetic (evolutionary) trees from DNA sequence data based on the maximum likelihood criterion is most probably NP-complete. Furthermore, the computation of the likelihood value for one single potential tree topology is computationally intensive. This thesis introduces a number of algorithmic and technical solutions which for the first time enable parallel inference...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010